The following report presents a quantitative text analysis of “The Guardian” newspaper’s headlines in the periods going from 18 October 2021 and 28 November 2021 to apply a sentiment analysis around the negotiations of COP26 that took place in Glasgow from October 31st and November 12th. Such descriptive analysis aims at finding eventual changes in the opinions and/or political positioning of the abovementioned media outlet. ADD CONCLUSIONS
The 26th Conference of the Parties that took place in Glasgow has represented a crucial moment for climate policy negotiations.
The majority of the most influential international leaders attended the event to discuss on future global action regarding climate mitigation and adaptation, together with non-state actors and internationally renowned personalities. Such occasion gained substantial media attention from all over the world, with peaks in the intervals right before the starting of the COP (the so-called ‘PreCOP’ events), during the actual happening of the Conference, and right after the conclusion of such event.
However, media outlets would approach climate change in different ways that reflect their political positioning: the headlines, the highlights as well as the frequently mentioned topics would differ based on political position.
We collect data from the headlines of a British newspaper to analyse possible trends and changes in sentiment along the specific timeframe that goes from the weeks right before the Conference until the period right after it.
As public policy students concerned about climate negotiations, we are interested in investigating the opinions and attitudes expressed by media outlets in the above-mentioned periods of time. We are mainly concerned on whether the standpoint and the perspective of media outlets changed over time, and how the trends of this change could have developed.
The relevance of our analysis stands in our curiosity for newspapers’ behavior concerning international, critical occasions as the COP26. Understanding whether they actually aim to inform people with the addition of a particular sentiment (that could also aim at reflecting a general feeling from the readers), or whether they prefer to remain neutral and objectively report factual events, could foster a deeper comprehension of the role of information and media in climate change developments.
To accomplish that, we decided to analyse the sentiment of just one media outlet that is published in the COP26 host country, UK, that is The Guardian. This newspaper is considered as a left-leaning, according to YouGov findings. For instance, topics as climate financing were among the most critical ones put on the table of COP26 negotiations, hence choosing a non-neutral outlet - which would have endorsed such topic - can show more compelling results in terms of changes in the political positioning with respect to the outcomes of the Conference, that can eventually be reflected in the headlines.
The principal objective of this research aims at analysing trends in the attitude of the newspaper headlines with respect to COP26 topics. Then, the other questions related to this analysis are related to two macro areas. The first one tackles the original and main interest, that is whether the ratio of positive and negative words changes over time, and in which period this eventually happens. Questions related to this area are:
The second area concerns a more specific analysis that also takes into account how such results could change when using different measurement instruments, in this case, dictionaries. Questions related to this area are:
The text analysis is performed mainly using two different packages: tidytext and quanteda. Both packages follow the tidyverse design philosophy. The main difference between these two tools is that quanteda works with Corpus objects, proper of the NLP logic, while tidytext can process texts in their character format. We employed both tools to carry on all of our research questions in the most appropriate way. Specifically, tidytext was useful to build analyses and visualizations with dates, in a simpler manner than with the quanteda document level variables. The quanteda package was instead particularly useful for the targeted sentiment analysis we conducted, together with the fact that it was possible to check the consistency of results also with another dictionary, the LSD2015 one. The keywords in context function was also used as an explorative tool.
Being a newspaper of the host country of the climate negotiations, The Guardian would not represent an ideal sample of headlines that would allow us to deduce if COP26 has met the expectations or not through the sentiment analysis. Indeed, the results would only show the changes in opinions for the specific political leaning that such outlet represents. However, the values of this project are to apply procedures of sentiment analysis after scraping information from the web and present them to the user in an accessible format. Therefore, it is necessary to acknowledge the very limited scope of this analysis. The relevance of such investigation can only be applied to this specific and small sample.
Additionally, a further limitation concerns the dates that have been scraped from The Guardian website. Given the used web-scraping strategy, the most recent dates (December and end of November 2021) present some missing values caused by a heterogeneous format in the website pages. For demonstration purposes we simply dropped those missing values, further limiting the scope of the analysis.
The webscraping, cleaning and formatting section of the analysis can be found in the R script scraping_and_data_cleaning that is available in the repository.
The webscraping strategy adopted consists in downloading the headlines from multiple pages of the newspaper website by date (static webscraping). The formatting step includes transformation of dates into the correct format with lubridate and and data preparation for the quantitative text analysis with tidytext. In this part, words regarding the main topic of the headlines (“cop26”, “glasgow”,“climate”,“change”) were expected to be very frequent, other than not contributing to a specific senitment, so they have been removed as stopwords.
Through the exploration of the collected data, we aim at understanding which are the most frequent words and whether they could have a role in our investigation.
Thanks to a frequency table and an explorative WordCloud, we visualize the most frequent words. We identify ‘crisis’ as the most frequent word (other than the customized stopwords) used in the headlines during the COP26 period.
Thanks to the keyword in context table, it is explored quickly whether any case in which the word ‘crisis’ has a role different from being part of the ‘climate crisis’ bigram is present. It is not found to be the case. Since the main topic of COP26 is exactly that of ‘tackling the climate crisis’, this word, despite clearly indicating a negative sentiment, does not represent relevant information. It is therefore dropped.
Comment on wordcloud
| word | n |
|---|---|
| crisis | 58 |
| net | 52 |
| world | 50 |
| video | 48 |
| australia | 41 |
| johnson | 40 |
| happened | 38 |
| global | 34 |
| boris | 33 |
| emissions | 32 |
The sentiment analysis applied to the collected headlines is conducted using a dictionary-based method. The three used dictionaries are:
‘Bing et Al.’,
‘AFINN’
‘Lexicoder Sentiment Dictionary’ (LSD2015)
The choice of these dictionaries is mainly based on common practice and on the objective of our research to check the sentiment of the headlines around the climate negotiations, quantify them and detect any potential patterns and the consistency of these results.
From the tidytext package, we use the ‘Bing et al.’ and the ‘AFINN’ dictionaries. These are general-purpose lexicons based on unigrams (single words). The first one classifies the words into negative or positive, while the second one scales the sentiment by assigning a value between a range of -5 and +5, classifying words with values very negative and very positive respectively.
From the quanteda package, the Lexicoder Sentiment Dictionary represents a more than valid alternative, due to its particular versatility with respect to sentiment analysis for political communication (Young, L. & Soroka, S., 2012). Such dictionary consists of 2,858 ‘negative’ sentiment words and 1,709 ‘positive’ sentiment words. The novelty of Young and Soroka approach stands in a further set of 2,860 and 1,721 negations of negative and positive words, respectively. However, we did not find such additional set useful for our research purposes.
As explained before, this dictionary included in the tidytext package classifies the words into positive or negative.
Applying this dictionary to our dataset resulted in assigning sentiment values to 295 words which are distributed over the examined time period. The graph above shows clearly that the count of word with either a positive or a negative sentiment has increased during the negotiations (October 31st – November 12th)zone delimited by two vertical bars of the graph). Yet, through this graph we cannot deduce any trend about the general sentiment found in the headlines of “The Guardian”.
The most repeated words with a positive or a negative sentiment value can be seen in the barplot below.
In order to have a more detailed view of the results, you can find below two interactive graphs that show the count of each type of words by date, and in a second graph, the count of classified words by dates is shown.
Through this plot, one can clearly notice the surge of articles/words that were published during COP26 and that the prevalent speech of the guardians headlines was rather not classified.
This surge of wordcounts and prevalence of nonclassified words especially during COP could have been expected, as during this period, a local news outlet would cover extensively the negotiations and report to them in a rather non-subjective way.
In this second graph, only classified words are shown. It is clear that negativity is prevailing over positivity of the headlines especially after COP26. As seen in the frequently occurring negative words, words like “protest”, “limit”, “poor”, etc… are strongly mentioned the fact which leads us to ask the following questions: In the context of COP26 are protests negative? Don’t protest movements like Fridays for future have a positive impact on the Climate Crisis? How negative and biasing are these words?
In order to have a better understanding on how negative/positive the used words in The Guardian’s headlines, the AFINN dictionary assigns values from -5 to +5 to the words it classifies, ranging thus from very negative to very positive respectively.
Plotly package is used to visualize the values of negativity and positivity. This package offers you the option to compare data by hovering over the points, so you can examine the wound count by date for each sentiment value.
Not as bad as expected right! The negative trend seen in the results of the bing et Al dictionary looks better now after seeing that the vast majority of the classified words falls in the -2:+2 score range.
It is remarkable that not a single word was assigned a value or -5, or a value of +5. The world leaders and the guardians did not overkill the expectations of the people and on the other hand, no toxic positivity!
A similar application of the AFINN dictionary was applied to headlines related to Boris Johnson who seemed to be on the more negative side of the spectrum, before, during and after COP26 yet, Johnsons seems to navigate well between positivity and negativity.